LOGIC.2[W87,JMC]2 - www.SailDart.org

perm filename LOGIC.2[W87,JMC]2 blob sn#839810 filedate 1987-05-10 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00005 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	\input memo.tex[let,jmc]
C00034 00003	\noindent {\bf Remarks}
C00040 00004	\noindent{\bf References}
C00047 00005	\smallskip
C00048 ENDMK
C⊗;
\input memo.tex[let,jmc]
\title{MATHEMATICAL LOGIC IN ARTIFICIAL INTELLIGENCE}

	This article concerns computer programs that represent information
about their problem domains in mathematical logical languages and use
logical inference to decide what actions are appropriate to achieve their
goals.  We don't discuss logical theorem proving as an AI problem
domain, although this important domain of AI is obviously
related to the efficiency of the use of logic.

	The idea of using mathematical logic to express information
about the common sense world can be traced back to Leibniz, but
there have been persistent difficulties, and mathematical logic
has been far more successful in representing assertions in pure
mathematics.
Moreover, the interests of mathematical logicians today
have become metamathematical rather than mathematical, and
the logical systems presented in textbooks today are best
suited to be the subject of informal metamathematical proofs about
what can be proved in them rather than for expressing mathematical
facts or facts about the world.

	(McCarthy 1960) contains the first proposals to use logic in AI,
but the AI use of logic also encountered difficulties.  Some of the
problem is to make the computation efficient but the more serious
problems are epistemological --- deciding what knowledge about the
world is required for common sense reasoning, how to express it
formally and deciding what are the appropriate kinds of inference
to be used.
  These were only
sporadically attacked at first, but more persistent efforts since the
middle 1970s have met some success and led to some extensions of logic,
e.g. to non-monotonic logics.  Further extensions to logic itself
 seem to be necessary.

	The 1960 paper said:

\begingroup\narrower\narrower
% COMMON.TEX[E80,JMC]	TeX version Programs with Common Sense
The {\it advice taker} is a proposed program for solving problems by
manipulating sentences in formal languages.  The main difference between
it and other programs or proposed programs for manipulating formal
languages (the {\it Logic Theory Machine} of Newell, Simon and Shaw and
the Geometry Program of Gelernter) is that in the previous programs the
formal system was the subject matter but the heuristics were all embodied
in the program.  In this program the procedures will be described as much
as possible in the language itself and, in particular, the heuristics are
all so described.

	The main advantages we expect the {\it advice taker} to have is that
its behavior will be improvable merely by making statements to it,
telling it about its symbolic environment and what is wanted from it.  To
make these statements will require little if any knowledge of the program
or the previous knowledge of the {\it advice taker}.  One will be able to
assume that the {\it advice taker} will have available to it a fairly wide
class of immediate logical consequence of anything it is told and its
previous knowledge.  This property is expected to have much in common with
what makes us describe certain humans as having {\it common sense}.  We
shall therefore say that {\it a program has common sense if it automatically
deduces for itself a sufficiently wide class of immediate consequences of
anything it is told and what it already knows.}
\par\endgroup

	The main reason for using logical sentences extensively in AI
is realized by more researchers today than in 1960.
  Expressing information in declarative
sentences is far more modular than expressing it in segments of computer
program or in tables.  Sentences can be true in much wider contexts than
specific programs can be useful.  The supplier of a fact does not have
to understand much about how the receiver functions or how or whether
the receiver will use it.  The same fact can be used for many purposes,
because the logical consequences of collections of facts can be available.

	The {\it advice taker} prospectus was ambitious in 1960, would be
considered ambitious today and is still far from being immediately
realizable.  The rest of this paper is largely concerned with describing
what progress has been made, what the obstacles are, and how the
prospectus has been modified in the light of what has been discovered.

	The formalisms of logic can be used to differing extents, mostly
much less ambitious, and we'll begin by recounting some of them.

	1. A machine may use no logical sentences --- all its ``beliefs''
being implicit in its state.  Nevertheless, it is often appropriate to
ascribe beliefs and goals to the program, i.e. to remove the above
sanitary quotes, and to use a principle of rationality --- {\it It does
what it thinks will achieve its goals}.  Such ascription is discussed in
(Dennett 1971), (McCarthy 1979) and (Newell 1980).  The advantage is that
the intent of the machine's designers and the way it can be expected to
behave may be more readily described {\it intentionally}
 than by a purely physical
description.

	  The relation between the physical and the {\it intentional}
descriptions is most readily understood in simple systems that admit
readily understood descriptions of both kinds, e.g. thermostats.  Some
finicky philosophers object to this, contending that unless a system has a
full human mind, it shouldn't be regarded as having any mental qualities
at all.  This is like omitting the numbers 0 and 1 from the number system
on the grounds that numbers aren't required to count sets with no elements
or one element.  There is much more that can be said about ascribing
mental qualities to machines, but that's not where the main action is in
AI.

	2. The next level of use of logic involves computer programs that
use sentences in machine memory to represent their beliefs but use
other rules than ordinary logical inference to reach conclusions.
New sentences are often obtained from the old ones
by ad hoc programs.  Moreover, the sentences that appear in memory are
from a program dependent subset of the logical language being used.
Adding certain true sentences in the language may even spoil the functioning of
the program.  The languages used are often rather unexpressive compared to
first order logic, for example they may not admit quantified sentences,
and they may represent ``rules'', e.g. the equivalent of universally
quantified sentences in a separate notation.  Often rules cannot be
consequences of the program's action; they must have all been put in by the
``knowledge engineer''.  Sometimes the reason programs have this form is
just ignorance, but the usual reason for the restriction is the practical
one of making the program run fast and deduce just the kinds of
conclusions its designer anticipates.  Most often the rules are
implications used in just one direction, e.g. the contrapositive of an
implication is not used.  We believe the need for such specialized inference
will turn out to be temporary and will be reduced or eliminated by
improved ways of controlling general inference, e.g. by allowing the
heuristic rules to be also expressed as sentences as advocated in
the above extract from the 1960 paper.

	3. The third level uses first order logic and also logical
deduction.  Typically the sentences are represented as clauses, and the
deduction methods are based on J. Allen Robinson's (1965) method of
resolution.  It is common to use a theorem prover as a problem solver,
i.e.  to determine an $x$ such that $P(x)$ as a byproduct of a proof of
$∃x.P(x)$.  This level is less used for practical purposes than level two,
because techniques for controlling the reasoning are still insufficiently
developed, and it is common for the program to generate many useless
conclusions before reaching the desired solution.  Indeed unsuccessful
experience (Green 1969) with this method led to more restricted uses of
logic, e.g. the STRIPS system of (Nilsson and Fikes 1971).

	The commercial ``expert system shells'', e.g. ART, KEE and
OPS-5, use logical representation of facts, usually ground facts only,
and separate facts from rules.  They provide elaborate but not always
adequate ways of controlling inference.

	In this connection it is important to mention logic programming,
first introduced in Microplanner (Sussman et al., 1971) 
and from different points of view by Robert Kowalski (1979) and Alain
Colmerauer in the early 1970s.
A recent text is (Sterling and Shapiro 1986).  Microplanner
was a rather unsystematic collection of tools, whereas Prolog relies
almost entirely on one kind of logic programming, but the main idea
is the same.  If one uses a restricted class of sentences, the so-called
Horn clauses, then it is possible to use a restricted form of logical
deduction, and the control problem are much eased, and it is possible
for the programmer to anticipate the course the deduction will take.
The price paid is that only certain kinds of facts are conveniently
expressed as Horn clauses, and the depth first search built into
Prolog is not always appropriate for the problem.  Nevertheless,
expressability in Horn clauses is an important property of a set
of facts and logic programming has been successfully used for many
applications, although it seems unlikely to dominate AI programming.

	Although  third level systems express both facts and rules
as logical sentences, they are still rather specialized.  The axioms
with which the programs begin are not general truths about the world
but are sentences whose meaning and truth is limited to the narrow
domain in which the program has to act.  For this reason, the ``facts''
of one program usually cannot be used in a database for other programs.

	4. The fourth level is still a goal.  It involves representing
general facts about the world as logical sentences.  Once put in
a database, the facts could be used by any program.  The facts would
have the neutrality of purpose characteristic of much human information.
The supplier of information would not have to understand
the goals of the potential user or how his mind works.  The present
ways of ``teaching'' computer programs amount to ``education
by brain surgery''.

	A major difficulty is that fourth level systems require extensions
to mathematical logic.  One kind of extension is non-monotonic
reasoning, first proposed in the late 1970s (McCarthy 1977, 1980, 1986),
(Reiter 1980), (McDermott and Doyle 1980).  Traditional logic is monotonic
in the following sense.  If a sentence $p$ is inferred from a collection
$A$ of sentences, and $B$ is a more inclusive set of sentences (symbolically
$A ⊂ B$), then $p$ can be inferred from $B$.  If the inference is logical
deduction, then exactly the same proof that proves $p$ from $A$ will
serve as a proof from $B$. If the inference is model-theoretic, i.e.
$p$ is true in all models of $A$, then $p$ will be true in all models of $B$,
because the models of $B$ will be a subset of the models of $A$.  So we
see that the monotonic character of traditional logic doesn't depend
on the details of the logical system but is quite fundamental.

	While much human reasoning corresponds to that of traditional
logic, some important human common sense reasoning seems not to be
monotonic.  We reach conclusions from certain premisses that we would not
reach if certain other sentences were included in our premisses.  For
example, learning that I own a car, you conclude that it is appropriate on
a certain occasion to ask me for a ride, but when you learn the further
fact that the car is in the garage being fixed you no longer draw that
conclusion.  It is possible to try to save monotonicity by saying that
what was in your mind was not a general rule about asking for a ride from
car owners but a probabilistic rule.  It has not proved fruitful to try to
work out the detailed epistemology of this approach, i.e.  exactly what
probabilistic sentences should be used.  Instead AI has moved to directly
formalizing non-monotonic logical reasoning.

	Formalized non-monotonic reasoning is under rapid development
and many kinds of systems have been proposed.  I shall concentrate on
an approach called circumscription, because I know it, and because it
has met with wide acceptance and is perhaps the most actively pursued
at present.  The idea is to single out among the models of the collection
of sentences being assumed some ``preferred'' or ``standard'' models.
The preferred models are those that satisfy a certain minimum principle.
What is to be minimized is not yet decided in complete generality,
but many domains that have been studied yield quite general theories
using minimizations of abnormality or of the set of some kind of entity.
The idea is not completely unfamiliar.  For example, Ockham's razor ``Do
not multiply entities beyond necessity'' is such a minimum principle.

	Minimization in logic is another example of an area of mathematics
being discovered in connection with applications rather than via
the normal internal development of mathematics.  Of course, the reverse
is happening on an even larger scale; many logical concepts developed
for purely mathematical reasons turn out to have AI importance.

	As a more concrete example of non-monotonic reasoning, consider
the conditions under which a boat may be used to cross a river.  We all
know of certain things that might be wrong with a boat, e.g. a leak, no
oars or motor or sails depending on what kind of a boat it is.  It would
be reasonably convenient to list some of them in a set of axioms.
However, besides those that we can expect to list in advance, human
reasoning will admit still others, should they arise, but cannot be
expected to think of them in advance, e.g. a fence down the middle of the
river.  This is handled using circumscription by minimizing the set of
``things that prevent the boat from crossing the river''.  If the reasoner
knows of none in a particular case, he will conjecture that the boat can
be used, but if he learns of one, he will get a different result when he
minimizes.

	This illustrates the fact that non-monotonic reasoning
is conjectural rather than rigorous.  Indeed there are metatheorems
that certain mathematical logical systems cannot be rigorously extended,
i.e. that they have a certain kind of completeness.

	It is as misleading to conduct a discussion of this kind
entirely without formulas as it would be to discuss the foundations
of physics without formulas.  However, many people are unaware of
this fact.  Therefore, we present a formalization
by Vladimir Lifschitz (1987) of a simple example called ``The Yale shooting
problem''.  Drew McDermott (1987), who has become discouraged
about the use of logic in AI and especially about the non-monotonic
formalisms, invented it as a challenge.  Some earlier styles of
axiomatizing facts about change don't work right on this problem.
Lifschitz's method works well
in this case, but I think it will require further modification.

	In an initial situation there is an unloaded gun and a person
Fred.  The gun is loaded, then there is a wait, and then the gun
is pointed at Fred and fired.  The desired conclusion is the death
of Fred.  Informally, the rules are (1) that a living person remains alive until
something happens to him, (2) that loading causes a gun to become loaded,
(3) that a loaded gun remains loaded until something unloads it, (4) that
shooting unloads a gun and (5) that shooting a loaded gun at a person
kills him.  We are intended to reason as follows.  Fred will remain alive
until the gun is fired, because nothing can be inferred to happen to
him.  The gun will remain loaded until it is fired, because nothing
can be inferred to happen to it.  Fred will then die when the gun is
fired.  The non-monotonic part of the reasoning is minimizing ``the
things that happen'' or assuming that ``nothing happens without a reason''.

The logical sentences are intended to express the above 5 premisses,
but they don't explicitly say that no other phenomenon occurs.  For example,
it isn't asserted that Fred isn't wearing a bullet proof vest, nor are
any properties of bullet proof vests mentioned.  Nevertheless, a human
will conclude that unless some unmentioned aspect of the situation
is present, Fred will die.  The difficulty is that the sentences admit
an {\it unintended model}, to use the terminology of mathematical logic.
Namely, it might happen that for some unspecified reason the gun becomes
unloaded during the wait, so that Fred remains alive.  The way non-monotonic
formalisms, e.g. circumscription and Reiter's logic of defaults,
were previously used to formulate
the problem, minimizing ``abnormality'' results in two possibilities,
not one.  The unintended possibility is that the gun mysteriously
becomes unloaded.

	Lifschitz's axioms use the situation calculus of (McCarthy and
Hayes 1969) but introduce a predicate $causes$ as an undefined notion.

	We quote from (Lifschitz 1987).

``Our axioms for the shooting problem can be classified into three groups.
The first group describes the initial situation:
%
$$holds(alive,S0),\eqno(Y1.1)$$
$$¬ holds(loaded,S0).\eqno(Y1.2)$$
%
The second group tells us how the fluents are affected by actions:
%
$$causes(load,loaded,true),\eqno(Y2.1)$$
$$causes(shoot,loaded,false),\eqno(Y2.2)$$
$$causes(shoot,alive,false).\eqno(Y2.3)$$
%
These axioms describe the effects of 
{\it successfully performed} actions;
they do not say {\it when} an action can be successful.
This information is supplied separately:
%
$$precond(loaded,shoot).\eqno(Y2.4)$$
%
The last group consists of two axioms of a more general nature. We use
the abbreviations:
%
$$success(a,s) ≡ ∀f(precond(f,a) ⊃ holds(f,s)),$$
$$affects(a,f,s) ≡ success(a,s) ∧ ∃v\;causes(a,f,v).$$
%
One axiom describes how the value of a fluent changes after an action affecting
this fluent is carried out:
%
$$success(a,s) ∧ causes(a,f,v) ⊃ (holds(f,result(a,s)) ≡ v=true).\eqno(Y3.1)$$
%
(Recall that $v$ can take on two values here, $true$ and $false$; the
equivalence in $Y3.1$ reduces to $holds(f,result(a,s))$ in the first case and
to the negation of this formula in the second.)
If the fluent is not affected then its value remains the same:
%
$$¬affects(a,f,s) ⊃ (holds(f,result(a,s)) ≡ holds(f,s)).\eqno(Y3.2)\hbox{''}$$





	Minimizing $causes$ and $precond$ makes the right thing happen.
While these axioms and {\it circumscription policy} solve this problem, it
remains to be seen whether we can write a large body of common sense
knowledge in the formalism without getting other unpleasant surprises.
Another current question is whether we can get by with axioms about the
external world only or whether the axioms must contain information about
the purposes of the reasoning in order to determine the preferred models.
Moreover, there are many more possibilities to explore for the formal
minimum principle required for common sense reasoning.

	It seems unlikely that introducing non-monotonic reasoning
will be the only modification to logic that will be required in order
to give machines human capability for common sense reasoning.

	In order to make programs that reason about their own
knowledge and belief, i.e. have even rudimentary consciousness,
it is necessary to formalize many {\it intensional} notions, e.g.
knowledge and belief.  (McCarthy 1979a) formalizes some of them
in first order logic by introducing propositions and individual
concepts as individuals.  Complicating such efforts are the
paradoxes discovered by Montague (1963).  It will be necessary
to weaken the axioms suitably to avoid them, but a good way of
doing it hasn't yet been found.

It seems also to be necessary to formalize the notion of context,
but this is in a very preliminary state of investigation.

\noindent {\bf Remarks}

	Many of these remarks involve stating a position on issues
that are controversial even within AI.

	1. Artificial intelligence is best regarded as a branch of
computer science rather than as a branch of psychology.  Namely,it
is concerned with methods of achieving goals in situations in which
the information available has a certain complex character.  The methods
that have to be used are related to the problem presented by the situation
and are similar whether the problem solver is human, a Martian or a
computer program.

	Initially some people were over-optimistic about how long it
would take to achieve human level intelligence.  Optimism was natural,
because only a few of the difficulties had been identified.  Enough
difficulties have been identified by now to establish AI as one of the
more difficult sciences.  Maybe it will take five years to achieve
human level intelligence, and maybe it will take five hundred.

	2. It is still not clear how to characterize situations in which
intelligence is required.  Evidently they have a certain open-ended
character.  Even in a game like chess where the rules are fixed, the
methods for deciding on a move have an open ended character --- new
ways of thinking about chess positions are invented all the time.

	3. AI has so far identified certain methods of pattern matching,
heuristic search of trees of possibilities, representation of information
by rules and learning.  Other methods are still to be characterized,
especially methods of representing problems as collections of subproblems
that can be examined separately to get certain results that can then
be used in studying their interactions.

	4. The logic approach to AI is not the only one that may lead
to success.  For example, approaches more closely tied to biology
may succeed eventually, even though most of the biology motivated
approaches that have been tried since the 1950s have dried up.

	5. Much of AI's controversial character comes from its
implications for philosophy, a subject about which there are strong
views.  AI tends to support rationalist and realistic views of
philosophical problems rather than empiricist, phenomenological or
idealist views.  It encourages a piecemeal approach to the philosophy
of mind in which mental qualities are considered separately rather
than as part of a grand package.  This is because some systems have
important but rather limited mental qualities.

	6. There are many open problems in formalizing common sense and
many approaches to solving them awaiting exploration.  2,000 years of
philosophy has only limited relevance.  In my opinion, their proper
discussion is unavoidably mostly technical, involving the actual logical
formalisms being used.

	7. The situation calculus used above has important known
limitations.  The $result(e,s)$ formalism has to be modified to
handle continuous time, and a quite different formalism is needed
for expressing facts about concurrent events.  Kowalski's (1986)
 {\it event calculus} is a candidate for meeting both of these
requirements.

	8. The study of AI may lead to a mathematical metaepistemology
analogous to metamathematics.  Namely, one can study the relation between
a knower's rules for accepting evidence and a world in which he is
embedded.  There can then be actual mathematical theorems about whether
certain intellectual strategies will discover certain facts about the
world.  I think this will eventually revolutionize philosophy.

	9. The best general text on the logic approach to AI is the
forthcoming (Genesereth and Nilsson 1987).

\medskip
\noindent{\bf References}

\noindent
{\bf Dennett, D.C. (1971)}: ``Intentional Systems'', {\it Journal of Philosophy}
vol. 68, No. 4, Feb. 25.

\noindent
{\bf Fikes, R, and Nils Nilsson, (1971)}:
``STRIPS: A New Approach to the Application of 
Theorem Proving to Problem Solving,'' {\it Artificial Intelligence}, Volume 2,
Numbers 3,4, January,
pp. 189-208.

\noindent
{\bf Genesereth, Michael and Nils Nilsson (1987)}: {\it The Logical
Foundations of Artificial Intelligence}, Morgan-Kaufman.

\noindent
{\bf Green, C., (1969)}:
``Application of Theorem Proving to Problem Solving. In IJCAI-1, pp. 219-239.

\noindent
{\bf Kowalski, Robert (1979)}: {\it Logic for Problem Solving},
North-Holland, Amsterdam.

\noindent
{\bf Kowalski, Robert and Marek Sergot (1985)}: {\it A Logic-based Calculus of
 Events}, Dept. of Computing, Imperial College, London.

\noindent
{\bf Lifschitz, Vladimir (1987)}:
``Formal Theories of Action'', to be published.

\noindent
{\bf McCarthy, John (1960)}: ``Programs with Common Sense'', in Proceedings of the
Teddington Conference on the Mechanization of Thought Processes, Her Majesty's
Stationery Office, London.
%  common[e80,jmc],
% common.tex[e80,jmc]

\noindent
{\bf McCarthy, John and P.J. Hayes (1969)}:  ``Some Philosophical Problems from
the Standpoint of Artificial Intelligence'', in D. Michie (ed), {\it Machine
Intelligence 4}, American Elsevier, New York, NY.
%  phil[ess,jmc] with slight modifications

\noindent
{\bf McCarthy, John (1977)}:
``Epistemological Problems of Artificial Intelligence'', {\it Proceedings
of the Fifth International Joint Conference on Artificial 
Intelligence}, M.I.T., Cambridge, Mass.
%  ijcai.c[e77,jmc]

\noindent
{\bf McCarthy, John (1979)}:
``Ascribing Mental Qualities to Machines'' in {\it Philosophical Perspectives 
in Artificial Intelligence}, Ringle, Martin (ed.), Harvester Press, July 1979.
%  .<<aim 326, MENTAL[F76,JMC],mental.tex[f76,jmc]>>

\noindent
{\bf McCarthy, John (1979a)}: 
``First Order Theories of Individual Concepts and Propositions'', 
in Michie, Donald (ed.) {\it Machine Intelligence 9}, (University of
Edinburgh Press, Edinburgh).
%  .<<aim 325, concep.tex[e76,jmc]>>

\noindent
{\bf McCarthy, John (1980)}: 
``Circumscription - A Form of Non-Monotonic Reasoning'', {\it Artificial
Intelligence}, Volume 13, Numbers 1,2, April.
%  .<<aim 334, circum.new[s79,jmc], cirnew.tex[s79,jmc]>>

\noindent
{\bf McCarthy, John (1986)}:
``Applications of Circumscription to Formalizing Common Sense Knowledge''
{\it Artificial Intelligence}, April 1986
%  circum.tex[f83,jmc]

\noindent
{\bf McDermott, D. and J. Doyle, (1980)}:
``Non-Monotonic Logic I,'' {\it Artificial Intelligence\/},
Vol. 13, N. 1

\noindent
{\bf McDermott, D. (1987)}:
``Critique of Pure Reason'', forthcoming, 1987.

\noindent
{\bf Montague, Richard (1963)}: ``Syntactical Treatments of Modality, with
Corollaries on Reflexion Principles and Finite Axiomatizability,''
{\it Acta Philosophica Fennica\/} {\bf 16}:153--167.  Reprinted in (Montague 1974).

\noindent
{\bf Montague, Richard (1974)}: {\it Formal Philosophy}, Yale University Press

\noindent
{\bf Newell, Allen (1981)}: ``The Knowledge Level,'' {\it AI Magazine\/},
Vol. 2, No. 2

\noindent
{\bf Reiter, R.A. (1980)}: ``A Logic for default reasoning,'' {\it Artificial 
Intelligence\/}, 13 (1,2), 81-132.

\noindent
{\bf Robinson, J. Allen (1965)}: ``A Machine-oriented Logic Based
on the Resolution Principle''. {\it JACM}, 12(1), 23-41.

\noindent
{\bf Sterling, Leon and Ehud Shapiro (1986)}: {\it The Art of Prolog}, MIT Press.

\noindent
{\bf Sussman, Gerald J., Terry Winograd, and 
Eugene Charniak (1971)}: ``Micro-planner Reference Manual,'' Report AIM-203A,
Artificial Intelligence Laboratory, Massachusetts Institute of Technology,
Cambridge.
\smallskip
\centerline{Copyright \copyright\ \number\year\ by John McCarthy}
\smallskip
\noindent This draft of logic.2[w87,jmc] TEXed on \jmcdate\ at \theTime
\vfill\eject\end
%logic.2[w87,jmc]		Yet another try